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COMPUTING MODULAR POLYNOMIALS IN QUASI-LINEAR 

TIME 

ANDREAS ENGE 

Abstract. We analyse and compare the complexity of several algorithms for 
computing modular polynomials. Under the assumption that rounding errors 
do not influence the correctness of the result, which appears to be satisfied 
in practice, we show that an algorithm relying on floating point evaluation of 
modular functions and on interpolation has a complexity that is up to loga- 
rithmic factors linear in the size of the computed polynomials. In particular, 
it obtains the classical modular polynomial 4v of prime level i in time 

O (£ 2 log 3 l M{1)) C O {l 3 log 4+e I) , 

where M(t) is the time needed to multiply two £-bit numbers. 

Besides treating modular polynomials for T°(£), which are an important 
ingredient in many algorithms dealing with isogenies of elliptic curves, the 
algorithm is easily adapted to more general situations. Composite levels are 
handled just as easily as prime levels, as well as polynomials between a modular 
function and its transform of prime level, such as the Schlafli polynomials and 
their generalisations. 

Our distributed implementation of the algorithm confirms the theoretical 
analysis by computing modular equations of record level around 10000 in less 
than two weeks on ten processors. 



1. Definitions and main result 

Modular polynomials, in their broadest sense, are bivariate polynomials with 
a pair of modular functions as zero. Given any two modular functions / and g 
for arbitrary congruence subgroups, the function fields C(/) and C(g) are finite 
extensions of C(j), so that there are two polynomials relating / resp. g to j. Taking 
the resultant of these polynomials with respect to j, one sees that a polynomial 
relationship between / and g exists. In practice, one is rather interested in the 
minimal polynomial of / over C(g), say, that will be called the modular polynomial 
of / with respect to g. If the functions satisfy some conditions on the rationality and 
integrality of their q-expansion coefficients, the modular polynomial has rational 
integral coefficients. 

Different modular polynomials parameterise moduli spaces related to elliptic 
curves. Let V = S1 2 (Z)/{±1} = PS1 2 (Z) be the full modular group, and Cr = C(j) 
the field of modular functions invariant under T; j itself paramctcrises isomorphism 
classes of elliptic curves. Of special interest for applications is the congruence 
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for t prime; its modular polynomial parameterises isomorphism classes of elliptic 
curves together with an isogeny of degree t. This is heavily used in Atkin's and 
Elkies's improvements to Schoof's algorithm for counting points on elliptic curves 
[5j. Other applications include the computation of the endomorphism ring of an 
elliptic curve [25] [17] or of an isogeny between two elliptic curves [19] . 

More general modular polynomials occur in modern complex multiplication con- 
structions for elliptic curves [HI [13l Qj] ; they are discussed in Section [4] When 
counting points on an elliptic curve by the Schoof-Elkies-Atkin algorithm, the 
modular polynomials are usually supposed to be available after a precomputation 
phase. This makes sense in situations where the algorithm is carried out many times 
for curves of about the same size, as is typical in cryptography. When establishing 
point counting records [12], however, one quickly realises that the computation of 
the modular polynomials itself becomes a bottleneck that may even dominate from 
a complexity theoretic point of view. 

In this article, after surveying in Section [5] the state of the art as it appears 
in the literature, I present an algorithm based on evaluation and interpolation 
that computes modular polynomials in time essentially linear in the bit size of 
the polynomial. As it is based on floating point computations, one has to assume 
that rounding errors do not disturb the output, a heuristic that is supported by 
the implementation described in Section [5] A precise analysis of rounding errors to 
obtain a rigorously proved result appears to be out of reach. Note, however, that the 
correctness of the output may be checked probabilistically. For instance, one may 
instantiate the variable j occurring in the modular polynomial by the j-invariant 
of an elliptic curve over a finite field and verify that the specialised polynomial has 
the expected splitting behaviour. 

Under the assumption that it is sufficient to use a floating point precision of 
0(n) for a polynomial whose largest coefficient has n digits, the following holds: 

Result 1.1 (heuristic). LetV C T fee a congruence subgroup, f a modular function 
for V and $>(X) € C(j)[X] the characteristic polynomial of f with respect to the 
function field extension <Cr'/C(j). Assume that <I> has coefficients in and that 
the largest integer coefficient occurring in it has n digits. Suppose that a system of 
representatives ofT'\T is known and that f can be evaluated at precision 0(n) in 
time 0(\ognM(nj), where M(n) C O (nlog +e ) is the time needed to multiply two 
numbers with n digits. Then there is an algorithm that computes $ in time 

O (deg x <E> degj <E> ( log 2 max(deg x <&, deg^ <&) + logn) M(n)) . 

In particular, the classical modular polynomial such that &e(j(z), j(£z)) = for 
I prime is obtained in time 

O if log 2 lM{l\og()) C O (£ 3 log 4+e I) . 

If $ is a dense polynomial and all of its coefficients (or at least a constant 
fraction of them) are of bit size about n, then this algorithm is linear in the size of 
the polynomial, except for the logarithmic factors. 

The time bound of 0(log n M(n)) for evaluating modular functions at precision n 
is motivated by an algorithm due to Dupont; it relies on Newton iterations on an 
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expression involving the arithmetic-geometric mean and reaches this complexity for 
a wide class of modular functions, including those built from Dedekind's 77 function 
and in particular k, k' and j Theorem 5 and Section 7.2]. Alternatively, one 
may use an algorithm relying on multievaluation of g-expansions as described in 
[TOl Section 6.3]. Its complexity, worse by a factor of logn, is still sufficient for 
the running time of Result fTTTI to hold. The bound M(n) € O (nlog 1+£ ) may be 
achieved by the classical Schonhage-Strassen algorithm 30 or the more recent and 
asymptotically faster algorithm due to Fiirer [18] . 

The roots of <&, that is, the algebraic conjugates of /, are given by the f(M v z) 
with T'My running through a system of representatives of the residue classes T'\T 
(see [6]), so that 

(1.1) hx) = nV - f(M vZ )) = + ^ c r X^- r , 

i>—l r—l 

where the coefficients c r of $ are the elementary symmetric functions of the conju- 
gates f{M v z). 

By the (/-expansion principle (cf. [6l §3]), a sufficient condition for the c r to be 
polynomials in is that / has rational integral g-coefHcients, is holomorphic in 

i = {2 € C : 3z > 0} and all conjugates / (§j^pf) with f° ^\ e T have integral 

algebraic g-coefHcients. 

2. Approaches to computing modular polynomials for T°(£) 

The case of T (£) with £ prime, which is the most important one for applications, 
is also the simplest one from a theoretical point of view: The residue classes T°(^)\r 
are represented by 

T u =(l f\ > u = 0,...,£-l, and 5= ~ J 

0V* 



(The literature often concentrates on T {£) = T ^ n T. Since 

T (£) = S'- 1 r (^)5, a modular function / for T°(£) yields the function f(Sz) for 
Tq(£), and f(Sz) is actually a conjugate of /. So both result in the same modular 
polynomial $. Or phrased differently: Cro(f)/Cr is not normal, and another one 
of its embeddings is Qr m/Cr-) 

Different functions have been suggested in the literature: 

• j(z/£), leading to the so-called "classical" or "traditional" modular poly- 
nomials; 

• K> e (z) 2s = S for s = 12/gcd(12,^ - 1), yielding the so-called 
"canonical" modular polynomials; 

• functions that are moreover invariant under the Fricke-Atkin-Lehner invo- 
lution 2i-> =£• in particular, functions of the form 

T r ( V (z) V (£z)) 
j](z)r\(£z) 

evaluated in — with T r the r-th Hecke operator and linear combinations 
thereof as described in [37J Ch. 5.3]. 



4 



ANDREAS ENGE 



All these functions lead to modular polynomials with coefficients in Z. 

2.1. Heights of modular polynomials. The (logarithmic) height of a modular 
polynomial with coefficients in Z is defined as the logarithm of the largest absolute 
value of the coefficients. It provides a lower bound on the arithmetic precision 
required to compute the polynomial. For the classical polynomial between j(z) and 
j(z/£), the height is given by 

6(£+l)Qogi+0(l)) CO(£log£) 

according to [3]. For alternative modular polynomials, one observes a similar growth 
of the coefficients with £, but the constant is usually smaller. It should be possible 
to adapt the argumentation in [3] to obtain a similar bound for further classes of 
modular polynomials. 

2.2. Using (/-expansions. The system of representatives of r°(^)\r and thus the 
conjugates of the modular function / being explicitly known, one may use (|1.1[) to 
compute the modular polynomial $, if only a procedure for recognising its coeffi- 
cients c r as polynomials in Z[j] is available. The straightforward and most popular 
approach is to use g-expansions: The expansion of j is of the form g -1 + • • • , so 
that knowing the expansion c r = c rj fcg + • • ■ > one deduces that c r has leading 
term c r ^j k as a polynomial in j; one subtracts the g-expansion of this term and 
continues in the same vein. In fact, only the non-positive g-powers in the series 
expansions of the c r are needed. Several variants of the algorithm are described in 
PS] with a brief complexity analysis in [5J Section 3] . We shall describe only the 
algorithm with the best complexity and give a detailed analysis of its running time. 

Let / be a modular function for T°(£) with valuation v < at infinity; that is, 
/ admits a g-expansion of the form 

oo 

/(*) = £ a k q k / e G £{{q 1/l )) with = e 2 ™^ . 

k=vt 

Suppose that the a k are rational integers. (For instance, / = j(z/£) resp. / = tt)| s 
are valid choices with v = —l/l resp. v — — •) Then the g-expansions of 
the conjugates /„ = j(T u z) are of the same form with the coefficients a k being 
multiplied by powers of Q — e 27T ^ e . 

The last conjugate /oo = f(Sz) poses difficulties since it cannot be described 
generically, q (— ) = q~ 27Tl / z being unrelated to q in a simple way, and is thus 
treated separately. (And it is the fact that there is only one of them that makes 
the case of V = T° (£) stand out so that it is comparatively easily handled via the 
g-expansion approach.) Denote by the order of /oo at infinity. For / = j(z/£), 
one has /oo = j(£z) = q~ l + ■ ■ ■ and t>oo = ~£, and for / = ro 2s , the transformation 

behaviour of r) yields /oo = (v^^) 2S = £V ( ^ 1)/12 + with Voc = = 
i-i 

gcd(12,f-l) ' 

In particular, Vqq may be positive or negative. If it is negative, the term with 
lowest g-exponent occurring in (jl.ip is /oo ■ l~[y=o f v °f exponent £v + Woo ; so the 
degree of $ in j is l^w + Wooli reached for q+i. For instance, the degree in j is £+ 1 
for the classical modular polynomial $£. If i>oo is positive, the term with lowest 
g-exponent is n^=o fvi so the degree \£v\ of $ in j is reached in eg. For instance, 
the degree in j of the canonical modular polynomials is . 
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We first consider the complexity of determining the partial product 

t-i e 

H(X- /„) = X 1 + Y / ~CrX e - r e Z[Ce}((q^))[X}. 

u=0 r=l 

One may proceed by computing the Newton sums s r = X)^=o fv f° r 1 — r — & an< i 
use Newton's recurrence formulas to deduce the c r . Following [26], one obtains with 
f = EZrvi a Kr q k/l that 

OO 

(2.1) s r = ia ek, r q k e Z((g)), 

k= [rv~\ 

so that in fact the roots of unity do not interfere with the computations. One sees 
that the valuations of the s r at infinity are bounded below by \rv\ (and in general, 
they will be equal to it). Assume that d+1 terms of them are known, that is, s r is 
known up to and including the coefficient of q^ rv ^ +d . Then one shows by induction 
on Newton's relation 



(2.2) cr = i£(-l) fe+1 



s k c r 

r f — ' ' 

k=l 

that the valuation of c r is also bounded below by \rv\ and that it is also known up 
to q^+ d . 

If Voo > 0, we need only the terms of the c r with non-positive exponents; for 
Woo < 0, we loose some precision, in fact \voo \ terms, through the final multiplication 
by X — /oq. Thus, the value required for d is given by 

d = l\v\ + max(0, — ^oo) = degj 

This is an indication that the algorithm given above should be optimal among those 
relying on g-expansions: Outputting polynomials with up to deg^ $ + 1 coefficients, 
it manipulates series with as many terms. Unfortunately, the computation of the 
s r involves decimating the series for f r by I. Eventually, a factor of t is lost in the 
running time since 0(£d) terms of the f r are needed. 

Once the g-coefficients of / are known, the f r and s r are thus computed by 0(1) 
multiplications of series with 0(£d) terms and the c r by 0(£ 2 ) multiplications of 
series with 0(d) terms. Let M q (d) be the number of arithmetic operations in Z 
required to multiply two dense g-expansions with d terms. As M q is at least linear, 
the total complexity of the series computations becomes 

0{£M q (£d)). 

For functions / related to r\ quotients, such as / = Vo\ s or / = j(z/£) = 

( ro 3 4 (z/l) + 16) 3 



z/i) 



, the effort for computing their g-expansions is negligible, since it cor- 
responds to a constant number of arithmetic operations with series starting from 
the easily written down series expansion of r\. 

To write the c r as polynomials in j, one has to compute the non-positive parts 
of the g-expansions of the j k for 1 < k < d. This can be done in time 0(d M q (d)), 
which is negligible compared to the previous steps because d € 0(£). Identifying the 
c r as polynomials then corresponds to solving triangular systems of linear equations, 
requiring altogether O (£d 2 ) operations with integers. 
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The total complexity thus becomes 

0(£M q {£ 2 )) 

integer operations. 

Let n € O(^log^) as in Section |2~T1 be a bound on the height of the modular 
polynomial. Then the bit complexity of the algorithm is given by 

O (£ 3 log £M(n)) C O (£ 4 log 3+£ £) 

In [3j Appendix A] it is argued that in fact the running time is higher, on grounds 
that the coefficients of the powers of j grow faster than 0(£\og£). However, this 
objection can be dismissed by carrying out all computations modulo a sufficiently 
large prime of bit size n S 0{£\og£). Alternatively, and probably preferably in prac- 
tice, one may work modulo small primes and use the Chinese remainder theorem. 
Both approaches yield the desired complexity. 

Remark. If is sought only modulo some prime p, then the complete algorithm 
can be carried out modulo p as soon as p is larger than £ to make the divisions 
in (|2.2p possible. The bit complexity becomes 

0(£ 3 log^logp(loglogp) 1+e ) . 

2.3. Charles Lauter. In [3j, the authors describe an algorithm to compute the 
classical modular polynomial directly modulo a prime p without recourse to 
g-expansions of modular functions. Instead, they rely on the moduli interpretation 
of <f>£, which characterises pairs of ^-isogenous elliptic curves. So the algorithm 
manipulates only elliptic curves and explicit isogenies over extension fields of ¥ p . 

Its basic building block is the computation of the instantiated polynomial <E> = 
<5>e(X, j) with j € ¥ p 2 the j-invariant of a supersingular elliptic curve E. The roots 
of <i> are the j-invariants of the £ + 1 curves that are £-isogenous to E, and that may 
be obtained via Vlu's formulae [3T] once the complete £-torsion of E is known. These 
are £ 2 points defined over an extension of F p 2 of degree 0(£) in the supersingular 
case; in fact, one expects a degree of Q(£) virtually all of the time. Thus writing 
down the ^-torsion points requires a time of fl(£ 3 logp). 

After having repeated the procedure for deg 3 §i = Q(£) different values of j, 
one obtains the coefficients of $^ modulo p by interpolation. (This is analogous to 
the floating point approach of Section [31 and more details are given there.) The 
complexity of the algorithm is thus at least 

tl(£Hog P ), 

and an upper bound of O (£ A+E log 2+e p + log 4+e p) is proved in [3j Theorem 3.2] 
under the generalised Riemann hypothesis. 

Hence, this algorithm is slower by a factor of order £ than the one presented 
in Section 12.21 reiving on q-expansions. This is due to the costly determination of 
isogenies via their kernels: While the isogenies are defined over F p 2, their kernels 
lie in an extension of degree 0(£). The q-expansions of the conjugates of j(z/£), on 
the other hand, provide a synthetic description of the curves that are £-isogenous 
to the one with j-invariant j(z), and using them it is sufficient to carry out all 
computations in F p in order to obtain $f mod p. 
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3. EVALUATION-INTERPOLATION 

The approach for calculating modular polynomials that is described in this sec- 
tion is in fact neither new nor particularly involved. It has been suggested to me 
by R. Schertz during our work on the class invariants of [14] , and later R. Dupont 
pointed out to me that it can already be found in [H Chapter 4.5]. However, it 
has not been noticed before that the algorithm allows to lower the exponent of the 
computational complexity by 1 and thus to reach an essentially optimal complexity 
if fast polynomial arithmetic and fast techniques for evaluating modular functions 
as described in [10] are used. 

The basic idea of the evaluation-interpolation approach is to specialise and to 
compute the identity (jl.ip between modular functions in several complex floating 
point arguments (the evaluation phase); and then to interpolate the coefficients in 
order to recognise them as polynomials in j. Precisely, (jl.ip can be rewritten as 

[r : r'] [r:r'] 
(3.1) *(X,j(z)) = [] ( X ~ f( M »z)) = xlr ' T ' ] + E c r (z)X^- r 

for all z in the upper complex half plane. 

Evaluating the conjugates f{M u z) of / in a number of complex arguments Zk 
with Qzk > 0, multiplying out the left hand side as a polynomial in C[X] and 
separating the coefficients according to powers of X yields the values c r (zk). Writing 

c r = Y^s^o * c r.sj d ° Sj the Cr(zk) are actually the values of the polynomial c r in 
j(Zk), so that the coefficients c ryS £ C can be retrieved by interpolation as soon as 
the c r (zk) are known for degj $ + 1 values of Zk- The final step is to round the c r . s to 
rational integers. The degree of the modular polynomial in j or an upper bound on 
it may be known beforehand; if it is not, then the algorithm can be repeated with 
increasing guesses for deg^ $ until rounding to integers succeeds (and doubling the 
guess at each time results in the same asymptotic complexity as taking the correct 
value from the beginning). 

Let E(n) be the bit complexity for evaluating the modular functions / or j at 
a precision of n bits (here, / is considered to be fixed, while n tends to infinity 
with £). Then the evaluation phase requires (£ + l)(degj $ + 1) evaluations of / at 
a cost of 

0(£ degj $E(ri)) 

and the reconstruction of degj $ + 1 polynomials of degree £ + 1 from their roots. 
Multiplying complex polynomials by the FFT, this reconstruction step takes 

O (deg 3 $ £ log 2 £M(n) + log n M(n)) , 

where the (eventually negligible) term log nM(n) accounts for the computation of 
a primitive root of unity of sufficiently high order to carry out all the FFTs, and 
as usual M(n) is the time needed to multiply two n-bit numbers. 

The interpolation phase consists of £ interpolations of polynomials of degree 
degj Employing again fast algorithms such as [20], Algorithm 10.11], this takes 

O (£ de gj , $log 2 (de gi $) M{n)) , 

once the roots of unity are available. 

It is shown in [7] that among others, Dedekind's 7/ function can be evalu- 
ated at precision n in time E{n) £ O (log nM{n)) uniformly for the argument 
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in T = jzei: \dtz\ < |, \z\ > l}. (The restriction to T is not crucial since we 
may freely choose our interpolation points.) So in particular, all functions built 
from 77 such as j and Wi satisfy E(n) £ O (logn M(n)). (Alternatively, one may 
use multievaluation as described in [101 Section 6.3] with a slightly worse amortised 
cost of O (log 2 nM(n)) per value, which still allows to reach the final complexity 
of Result O) 

In total, the steps add up to a running time of 

O ((£ degj $ log 2 max(^, deg^ $) + log n)M (nj) 
C O ((deg x $ degj 3> log 2 max(deg x deg^ $) + log n)M (n)) , 

and the latter formulation is valid for arbitrary congruence subgroups V in the 
place of T°(£). 

Here, n must be at least as large as the logarithmic height of <f>, and maybe bigger 
to account for rounding errors. In the case of the classical modular polynomial $e 
the bound 0(£\og£) of Section [2TT1 yields a complexity of 

0(£ 2 log 2 £M{£\og£)) . 

This proves Result 11.11 

4. Further kinds of modular polynomials 

4.1. Modular functions of composite level. Besides its better complexity com- 
pared to the approach using q-expansions, the evaluation-interpolation algorithm 
has the advantage of a great flexibility, which makes it easily adaptable to a variety 
of different modular polynomials. In fact, the proof of Result 11.11 in Section [3] docs 
not use special properties of T°(£) and is valid for any congruence subgroup. 

An application is given by the computation of the polynomial relationship be- 
tween j and a modular function / of composite level N. At first sight, these poly- 
nomials are not of great interest. In the point counting or more generally isogeny 
computation context, it is more efficient to express an isogeny of composite degree 
as a composition of prime degree isogenies. But this kind of modular polynomials 
occurs naturally in the context of [14] . in which elliptic curves with complex mul- 
tiplication are obtained via modular functions / of composite level. Levels N = pq 
or N = p 2 with p and q prime are examined in [14j , but more general settings can 
be devised in a straightforward way. The polynomial relationship between / and 
j is used to derive from a special value of / the j-invariant of the corresponding 
elliptic curve. 

If the c^-expansion approach were to be pursued to compute this kind of modu- 
lar polynomials, it would be necessary to somehow obtain the g-expansions of the 
conjugates f{M v z) for a system of representatives (M„) of T'\T. This is straight- 
forward only for translations; for all other matrices, it is necessary to take the 
transformation behaviour of / under unimodular matrices into account, which re- 
quires ad hoc computations for each particular function. (This is illustrated by 

j(Sz/£) = j(£z) and toj s (Sz) = £ s S , which are not derived from j(z/£) 

resp. Wg(z) by the same generic transformation.) In the case of T°(£), only the 
matrix S asks for special treatment; in the case of composite level, many more 
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special matrices appear. For instance, the full article |15j is concerned with de- 
riving the conjugates of only the functions ^ ^(z^/(pipi))^(z) ) ^ or P 1 ' P 2 P rmie an d 

« = 24/gcd(24,(pi-l)(p2-l)). 

In the evaluation-interpolation approach, the only difference between r°(^) and 
other congruence subgroups V is the enumeration of the system of representatives 
(My). For most interesting T' (and in particular for arbitrary T°(N)), this is easy; 
in any case, this step is independent of the particular function /. Then Result 11.11 
applies, and one obtains an algorithm that is essentially linear in its output size. 



4.2. Schlafli equations. In Schlafli examines transformations of prime level I 
of special modular functions different from j that lead to particularly simple mod- 
ular equations. Weber gives a systematic treatment of them in [331 §§73-74]. Let 
f = C48 1 ^^P)^ = C48^2(^+ 1) be one of "the Weber functions", a modular func- 
tion of level 48. Let £ be a prime not dividing this level. Then g(z) — f(z/£) is the 
root of a monic polynomial $\{X) of degree £ + 1 with coefficients in Z[f]. 

The evaluation-interpolation approach allows to easily obtain this polynomial 
with only minimal modifications to the algorithm. The function f being modular 
for a subgroup of T(48) and £ being different from 2 and 3, a quick computation 
reveals that g is modular for T(48) PI T°(£). The polynomial $j, is thus given by an 
equation analogous to (II. lft : 

£+1 £ 

(4.1) $J(X)= l[(X-g(M vZ ))=X e+1 +Y l CrX e+1 - r , 



where the M v range over a set of representatives of (r(48) PI r°(£))\r(48). Such a 
set may be obtained by multiplying the standard representatives of r°(£)\r from 
the left by a matrix in T°(£) such that they end up in T(48). Precisely, a possible 

set of representatives is given by ^l^ ^ or v = ^' ' ' ' ' ^ — ^ (corresponding to 

the translations) and | ,1?^ , I with k = 48 _1 mod I. corresponding to 

; \ -48fc 1 + 48/c J 

the inversion S. 

So the only modification required for the evaluation-interpolation algorithm to 
work is this adaptation of the matrices M v , and of course j has to be replaced by f 
in the interpolation phase to recover the coefficients c r as elements of Z[f]. 

Hence Result fOI clearly applies also to the Schlafli equations, showing that they 
can be computed in essentially linear time with respect to the output size. 

Taking specifics of the function f into account, a more efficient algorithm can be 
obtained, gaining a constant factor. Weber shows in [32l p. 266] that only every 
24-th coefficient in f of the c r may be non-zero. Precisely, fg k having a non-zero 
coefficient implies £i + k = £ + 1 (mod 24) . Hence the number of evaluations can 
be reduced by a factor of about 24. (In [H Ch. 4.5], similar sparse modular equa- 
tions are studied, in which one out of eight coefficients is non-zero. The Borwein's 
suggest in this chapter the evaluation-interpolation approach while profiting of this 
sparseness.) Weber shows that the <&\ are symmetric, which could also be taken 
into account during the interpolation phase. 
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Weber derives the Schlafli equations in a very compact form. He considers the 
new functions 



fj ■ ' VJ(fg) s 

with r, s such that 12\(£- l)r, 12\(£ + l)s and = (mod 2). Depending 

on I mod 24, these conditions are satisfied by some r and s such that 2r\£ + 1 and 
2s | £ - 1. Weber [32l p. 268] then shows that 

/ (<+l)/(2r)-l (*-l)/(2s)-l 

\ a=0 0=0 

In this form, the polynomial has about £ 2 /{Ars) coefficients, which is often less 
than the roughly £ 2 /24 coefficients if it is written as an equation between f and 
g. However, the more compact form appears to be less suited to the evaluation- 
interpolation algorithm: There is no easy way, in the spirit of (II. ip . of obtaining 
its values when interpreting it as a univariate polynomial in A(z) with coefficients 
in 7j[B{z)]. So instead of performing 0(£) interpolations of polynomials of degree 
0(£), one would apparently have to obtain all coefficients at once by solving a linear 
system with 0{£ 2 ) unknowns, resulting in a complexity that is worse than that of 
Result O 



4.3. Generalised Schlafli equations. The Schlafli equations of the previous para- 
graph can obviously be generalised to further modular functions: Given a modular 
function / and a prime £, one may want to compute the polynomial relating f{z) 
and f(z/£). 

In [24l Ch. 5], a theory analogous to Weber's is developped for the functions ro^. 
The derived polynomials are used to obtain symbolically minimal polynomials of 
special algebraic values of rj quotients. 

In the context of a p-adic algorithm for the computation of class polynomials 
and ultimately elliptic curves with complex multiplications, another application is 
developed in [3J Ch. 6.8]. Let r be a quadratic integer. Then the singular value j(r) 
lies in the ring class field for the order O = [1, t\% and is the j-invariant of an elliptic 
curve with complex multiplication by O. In [5], the authors describe an algorithm 
to compute the canonical lift of j(r) to the p-adic numbers at arbitrary precision 
by some kind of Newton iteration. The main algorithmic step consists of applying 
an isogeny to j(r) (or more precisely to the elliptic curve with this j-invariant) that 
corresponds to a smooth principal ideal and that can thus be realised by composing 
isogenies of manageable prime degrees. In [2], the algorithm is generalised to other 
class invariants. Let / be a modular function for T°(N) such that /(r) is a class 
invariant in the sense that it also lies in the ring class field of O. Let £ be a prime 
number not dividing TV that splits as (£) = 11 in O. As in the case of j(r), one 
needs to "apply an isogeny" and compute /(I -1 ). This value is given as a root 
of the modular polynomial ^ between f(z) and f(z/£), specialised with /(r) in 
the place of f(z). (The modular polynomial <& between f(z) and j(z), treated in 
Section 14. li also plays a role as it helps to choose whenever ^ has multiple roots: 
/([ _1 ) is also a root of $ specialised with j([ _1 ) in the place of j{z).) 

A quick computation shows that g(z) = f(z/£) is modular for T°(£) nr°(A) = 
T°(£N), so that (|4.ip yields the minimal polynomial of g over Cro(jv) if the M„ are 
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chosen as a system of representatives of T°(£N)\T°(N). Such a system is obtained 
precisely as for the Schlafii equations by multiplying the standard representatives 
of r°(£)\r by matrices in T°(£) so that they end up in T°(N). For instance, a set of 



with k = N mod i. So at first sight, the evaluation-interpolation algorithm 
seems to apply. 

The problem lies in recognising the coefficients c r as polynomials in /. In fact, 
the c r are modular for T°(N), so unless Cro(jv) = C(/) (otherwise said, / is a 
Hauptmodul for T°(N)), there is no reason that they are rational in /. Generi- 
cally, one expects C r o(jv) = C(j, /) and may hope (under suitable integrality and 
rationality conditions) to recover the c r as elements of Z[j, /]. 

A necessary condition for / being a Hauptmodul is that the modular curve Xq (N) 
is of genus 0, which limits the possible values of TV to a very short list. In general, 
it will be necessary to replace the minimal polynomial of g over Cro(Ar) by that over 
C(/); its degree will be I + 1 multiplied by [Cr<i(jv) ; It is not clear whether 

the evaluation-interpolation approach can be adapted to this context, since C(/) 
is not the field of modular functions for any congruence subgroup. 



The evaluation-interpolation algorithm has been implemented in C by the au- 
thor for different kinds of modular polynomials. One big advantage of the algorithm 
(besides its superior complexity) is that it is rather straightforward to implement, 
especially in the context of complex multiplication (cf. [10 ) , where the major 
building blocks such as the evaluation of modular functions and computations with 
polynomials over floating point numbers are already in use. The author's implemen- 
tation relies on the existing libraries gmp [22] with an assembly patch for AMD64 
by P. Gaudry [21] for fast basic multiprecision arithmetic, mpfr [23] and mpc |16j . 
which provide elementary operations with arbitrary precision floating point real 
and complex numbers with exact rounding. A library for fast polynomial opera- 
tions via Karatsuba, Toom-Cook and the FFT has been written for the occasion 
[9]. All timings mentioned in the following have been obtained on Opteron proces- 
sors clocked at 2.4 GHz. The heights are given as logarithms in base 2, and the 
arithmetic precision is given in bits. 

5.1. Details of the implementation. 

5.1.1. Details slowing the implementation down asymptotically. Of the different 
techniques for evaluating modular functions described in [10| , the asymptotically 
fast ones are not beneficial in the present context. For computing univariate class 
polynomials, they do not pay off at degrees below 100 000. Such high degrees are 
for the time unachievable for bivariate modular polynomials, since the number of 
coefficients would make it impossible to store the result. Hence all examples have 
been computed via the sparse representation of the n function as a g-series. 

The interpolation phase turns out to take negligible time; thus a simple quadratic 
algorithm using iterated differences is employed instead of a quasilinear one. 

5.1.2. Details speeding the implementation up. In order to work with real instead 
of complex numbers as much as possible, purely imaginary values are chosen for 
the zt- Then the q{zk) are real, and so are the values j(zk) and thus, by (|3.1| . the 
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$>(X,j(zk))- While the values of the conjugates f(M u Zk) still have to be computed 
as complex numbers, roughly half of them suffice for functions / for T (£) with 
rational q-coefficients: f(zk) is real, f(zk — r) is the complex conjugate of f(zk + r) 
for reZ, and finally f(—l/zk) has to be real again to obtain a real polynomial in 
the end. When reconstructing $(AT, j{zk)) from its roots, the complex conjugate 
roots can be grouped so that only real arithmetic is required. 

Also the interpolation phase uses only real numbers. It is furthermore helped 
by choosing the Zk such that the j{zk) lie in a simple arithmetic progression, for 
instance, j( z fc) = 1728 + k for k > 1. 

5.1.3. Parallelisation. Another nice feature of the evaluation-interpolation algo- 
rithm is that it can be parallelised almost arbitrarily, in principal up to distributing 
at the level of each modular function evaluation. A more natural, coarser paral- 
lelisation of the evaluation phase, which needs almost no communication, is amply 
sufficient in practice: Each processor treats a different Zk and computes an approx- 
imation <I>(X, j(zk)) by evaluating the conjugates (f(M u Zk))v and reconstructing 
the polynomial from these roots. The result is written into a file on a shared hard 
disk. The interpolation phase has been too fast to warrant a parallel implemen- 
tation. If it were to become a bottleneck, a natural way of distributing the work 
would be to have each processor compute the coefficients of a different polynomial 

Cr eZ[i]. 

Implementing the communication via files solves a second problem. During the 
evaluation phase, the matrix (c r (zk)) k r is computed row- wise; for the interpolation, 
it is accessed column-wise. However, the matrix is roughly as big as the final 
modular polynomial and for the largest computed examples (see Section 15.2. ip 
does not fit into main memory any more. Writing each row into a different file 
during the evaluation and reading all files in parallel during the interpolation can 
be seen as a means of transposing the matrix on disk without having to keep it in 
main memory. 

5.1.4. Arithmetic precision. Except for the classical modular polynomials, no up- 
per bound on the height of the polynomials is known, and even in the classical case, 
the bound is not completely explicit, but contains an 0(1) term (see Section |2~H . 
In the implementation, an approximate bound is obtained by carrying out all com- 
putations first at very low precision (100 bits). Due to the numeric instability of 
interpolation, the resulting coefficients are wrong already in the first digit, but one 
obtains an idea of the height of the correct polynomial (actually, the height at low 
precision turns out to be larger than the real one). This estimate is multiplied by 
a factor (between 1.1 and 2 depending on the function and the level), determined 
experimentally, to deal with rounding errors. 

In the context of class polynomial computations, it has been observed in |10| 
that virtually no rounding errors occur: Evaluating modular functions via the q- 
development of r\ as well as multiplication of polynomials is rather uncritical from 
a numerical point of view. It suffices to add a few bits to the target precision. This 
should also hold for the evaluation phase in our context. In practice, it can be ob- 
served, however, that a significantly higher precision is needed than just the height 
of the result. This can only be due to the interpolation phase, which implicitly 
handles Vandermonde matrices, that are known to be ill conditioned. It is an open 
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problem to choose the interpolation points j(zk) so as to minimise the rounding 
errors. 



5.2. Examples. 



5.2.1. Modular polynomials for T (£). For achieving the point counting record for 
elliptic curves with the Schoof-Elkies-Atkin algorithm [12], the polynomials have 
been systematically computed with the function 

,,,, f TrWzHtz)) 

(evaluated in —1/z) suggested in [27]. Here, 



is the r-th Hecke operator for a prime r > 5 such that 24|(r — l)(£ + 1), r is a 
quadratic residue modulo I and £ is a quadratic residue modulo r. (The unusual 
factor 24 in front of v takes care of the fact that the exponents in the g-expansion 
of rj are not integral due to the factor q 1 ^ 24 ; it eliminates 24-th roots of unity.) 

It is not advisable to obtain the values of fi r via its g-expansion. First of all, 
the expansion is dense and thus much slower to evaluate than that of rj. But more 
importantly, q tends to infinity when z approaches the cusp of the fundamental 
domain for T°(£), so that the q-series converges arbitrarily slowly. This is not 
an issue when relying on the evaluation of r\ only, since its known transformation 
behaviour under unimodular matrices allows to transform the arguments into the 
fundamental domain for T, whose only cusp is at infinity and corresponds to q = 0. 
So the implementation evaluates (|5.ip numerically with 2r + 4 evaluations of r\ per 
value of fi tT . The resulting dependence of the running time on r can be seen clearly 
in the following examples. 



1 


2039 


I 


2017 


r 


5 


r 


61 


degree in j 


136 


degree in j 


156 


estimated height 


5816 


estimated height 


6211 


precision 


6397 


precision 


6832 


height 


5040 


height 


5311 


time for evaluation 


5900 s 


time for evaluation 


74000 s 


time for interpolation 


110 s 


time for interpolation 


130 s 



The largest computed polynomial has a level of 10079. The computation takes 
less than two weeks on a small cluster with ten processors; as a compressed text 
file, the polynomial fills about 16 Gb of space. This illustrates that the limiting 
factor for the algorithm becomes not so much its running time, but rather the 
space requirements of its output, as can be expected for algorithms that have an 
essentially linear time complexity with respect to their output size. 
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£ 10079 



r 




5 


degree in j 




672 


estimated height 




31865 


precision 




35051 


height 




28825 


time for evaluation 


10 000 000 s r 


i 120 d 


time for interpolation 


56 000 s 


w 16 h 



It would be natural to try to linearly combine functions fi lT with different r to 
reduce the pole order at infinity, or even to find a function on Xq(£) with minimal 
pole order. As a result, the degree of the modular polynomial in j (and thus 
the number of interpolation points) and the size of its coefficients (and thus the 
required precision) could be reduced. A high price, however, would have to be paid 
by an increased running time for the evaluation. As high performance clusters are 
becoming increasingly available, while bandwidth and storage space appear to be 
the bottleneck for modular polynomial computation, this might, however, be the 
road to follow for constructing a manageable database of modular polynomials up 
to a high level. 

5.2.2. Schlafli equations. As explained in Section |4~2"1 a trivial modification to the 
enumeration of the £ + 1 matrices M v adapts the code to computing the Schlafli 
equations between f(z) and f(z/£). It is then imperative to change the interpolation 
code so as to take advantage of the sparseness of the polynomial, in which only every 
24-th coefficient is non-zero. This reduces the number of evaluations and (after a 
suitable transformation) the degree of the interpolated polynomials roughly by a 
factor of 24. The latter is also important since the interpolation phase becomes 
numerically more stable. 



£ 2039 



number of interpolation points 86 


estimated height 


2829 


precision 


3111 


height 


2380 


time for evaluation 


230 s 


time for interpolation 


35 s 



5.2.3. Generalised Schlafli equations. As argued in Section |4.3( the minimal poly- 
nomial of g(z) = f(z/£) over C(/) for some function / for T°(N) need not be of 
degree £ + 1 any more, in which case it is a priori unclear how to obtain it by 
evaluation-interpolation. But besides the functions / that are Hauptmoduln for 
T°(N), a few others may be handled by this approach. Namely, let 

= V(z/pi)y(z/p 2 ) 
PUP2 v(z)v(z/(PiP2)) 

with pi, p 2 prime and 24|(pi — l)(p 2 — 1) be the functions of level TV = pip 2 suggested 
as class invariants in [14j . These functions are invariant under the Fricke-Atkin 
Lehner involution. Now, Xq(N) is of genus for N = 35 and N = 39 [28], and 
apparently, the double r\ quotients are Hauptmoduln in this case. The degree of 
the modular equation of transformation level I thus remains I + 1 (an observation 
made for N = 35 in 2, Ch. 7.4]). 
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For instance, with / = 1x13,13 and g(z) = f(z/£), one obtains the following 
polynomials $™ 313 : 

'i'" ' = g 3 + f 3 -g 2 f 2 -gf + 2(g 2 f + fg 2 ) 

'I'" ' = g li • ./" g\r gf ■ 35 ff 3 / 3 

+5( 5 5 / 4 + g'\f' + g 5 f + gf g A f - g 3 f 4 - g 3 f 2 g 2 f + g 2 f + gf 2 ) 
+l0(-g 5 f 2 - g 2 f + g 4 f 4 + g 4 f + g 2 f - g 4 f - gf 4 + g 2 f 2 ) 

+7(g 7 f + g 6 f 7 - g 7 f - g 5 f 7 + g e f 4 + g A f + g 6 f 2 + g 2 f) 
+7(g 4 f 2 +g 2 f 4 -g 3 f-gf 3 + g 2 f + gf 2 ) 
+U(-g 7 f - gf 7 + g 5 f 4 + g 4 f + g 5 f + g 3 f + g 4 f 3 + g 3 f 4 ) 
+2i( V/ 4 - g 4 f + '/"./•'' + g 3 f - g 6 f - g 5 f + g 6 f 3 + g 3 f) 

+21( ff 5 / 2 + //'./"• + g 5 f + gf 5 - g 4 f - gf 4 - 5 3 / 2 - g 2 f) 
+42( ff 6 / 6 + 5 2 / 2 )-182 ff 4 / 4 

Like the Schlafli equations, the polynomials are symmetric in / and g, but as 
can be seen from these and further examples, they are no more sparse. 
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